29 research outputs found

    Multi-Sensor Event Detection using Shape Histograms

    Full text link
    Vehicular sensor data consists of multiple time-series arising from a number of sensors. Using such multi-sensor data we would like to detect occurrences of specific events that vehicles encounter, e.g., corresponding to particular maneuvers that a vehicle makes or conditions that it encounters. Events are characterized by similar waveform patterns re-appearing within one or more sensors. Further such patterns can be of variable duration. In this work, we propose a method for detecting such events in time-series data using a novel feature descriptor motivated by similar ideas in image processing. We define the shape histogram: a constant dimension descriptor that nevertheless captures patterns of variable duration. We demonstrate the efficacy of using shape histograms as features to detect events in an SVM-based, multi-sensor, supervised learning scenario, i.e., multiple time-series are used to detect an event. We present results on real-life vehicular sensor data and show that our technique performs better than available pattern detection implementations on our data, and that it can also be used to combine features from multiple sensors resulting in better accuracy than using any single sensor. Since previous work on pattern detection in time-series has been in the single series context, we also present results using our technique on multiple standard time-series datasets and show that it is the most versatile in terms of how it ranks compared to other published results

    A novel routing approach for source location privacy in wireless sensor networks

    Get PDF
    Wireless sensor networks (WSNs) allows the world to use a technology for event supervision for several applications like military and civilian applications. Network privacy remained a prime concern in WSNs. Privacy of Source location is assumed to be one of the main un-tackled issues in privacy ofWSNs. Privacy of the source location is vital and highly jeopardized with the use of wireless communications. For WSNs, privacy of source location is become more complex by the fact that sensor nodes are low cost and energy efficient radio devices. So, use of computation intensive encryption methods and large scale broadcasting based algorithms are found to be unsuitable for WSNs. Several schemes have been proposed to ensure privacy of source location in WSNs. But, most of existing schemes depends on public-key cryptosystems, while others are either energy inefficient or have certain security flaws like leakage of information using directional attacks or traffic analysis attacks. In this thesis, we propose a novel dynamic routing based approach for preserving privacy of source location in WSNs, which injects fake packets in network and switches the real packet information among different routing patterns. It addresses the privacy of source location by considering the limited features of WSNs. Major contributions of this work includes two aspects. Firstly, different from the existing approaches, the proposed approach considers enhancing the security of nodes with minimal transmission delay and consumes power with minimum effect on the lifetime of the network. Secondly, the proposed approach is designed to defend many attacks like hop by hop, directional attacks by choosing a suitable path to send information from node to BS dynamically without affecting network life significantly. Thus, it becomes difficult for the attacker to find the exact path, and hence the original location of node. The proposed approach is implemented and validated by comparing its results with that of the existing approaches in the field of source location privacy in terms of Power consumption, Transmission delay, Safety period, and network lifetime. The analysis of comparative results indicates that the proposed approach is superior to the existing approaches in preserving the source location privacy

    TCS-ILAB -MediaEval 2015: Affective Impact of Movies and Violent Scene Detection

    Get PDF
    ABSTRACT This paper describes the participation of TCS-ILAB in the MediaEval 2015 Affective Impact of Movies Task (includes Violent Scene Detection). We propose to detect the affective impacts and the violent content in the video clips using two different classifications methodologies, i.e. Bayesian Network approach and Artificial Neural Network approach. Experiments with different combinations of features make up for the five run submissions. SYSTEM DESCRIPTION Bayesian network based valence, arousal and violence detection We describe the use of a Bayesian network for the detection task of violence/non-violence, and induced affect. Here, we learn the relationship between different attributes of different types of features using a Bayesian network (BN). Individual attributes such as Colorfulness, Shot length, or Zero Crossing etc. form the nodes of BN. This includes the valence, arousal and violence labels which are treated as categorical attributes. The primary objective of the BN based approach is to discover the cause-effect relationship between different attributes which otherwise is difficult to learn using other learning methods. This analysis helps in gaining the knowledge of internal processes of feature generation with respect to the labels in question, i.e. violence, valence and arousal. In this work, we have used a publicly available Bayesian network learner [1] which gives us the network structure describing dependencies between different attributes. Using the discovered structure, we compute the conditional probabilities for the root and its cause attributes. Further, we perform inferencing for valence, arousal and violence values for new observations using the junction-tree algorithm supported in Dlib-ml library As will be shown later, conditional probability computation is a relatively simple task for a network having few nodes which is the case for image features. However, as the attribute set grows, the number parameters namely, conditional probability tables grow exponentially. Considering that our major focus is on determining the values of violence, valence and arousal values with respect to unknown values Copyright is held by the author/owner(s). MediaEval 2015 Workshop, Sept. 14-15, Wurzen, Germany of different features, we apply the D-separation principle Artificial neural network based valence, arousal and violence detection This section describes the system that uses Aritificial Neural Networks (ANN) for classification. Two different methodologies are employed for the two different subtasks. For both subtasks, the developed systems extract the features from the video shots (including the audio) prior to classification. Feature extraction The proposed system uses different set of features, either from the available feature set (audio, video, and image), which was provided with the MediaEval dataset, or from our own set of extracted audio features. The designed system either uses audio, image, video features separately, or a combination of them. The audio features are extracted using the openSMILE toolkit [4], from the audio extracted from the video shots. openSMILE uses low level descriptors (LLDs), followed by statistical functionals for extracting meaningful and informative set of audio features. The feature set contains the following LLDs: intensity, loudness, 12 MFCC, pitch (F0), voicing probability, F0 envelope, 8 LSF (Line Spectral Frequencies), zero-crossing rate. Delta regression coefficients are computed from these LLDs, and the following functionals are applied to the LLDs and the delta coefficients: maximum and minimum value and respective relative position within input, range, arithmetic mean, two linear regression coefficients and linear and quadratic error, standard deviation, skewness, kurtosis, quartile, and three inter-quartile ranges. openSMILE, in two different configurations, allows extractions of 988 and 384 (which was earlier used for Interspeech 2009 Emotion Challenge [5]) audio features. Both of these are reduced to a lower dimension after feature selection. Classification For classification, we have used an ANN that is trained with the development set samples available for each of those subtask. As data imbalance exists for the violence detection task (only 4.4% samples are violent), for training, we have taken equal number of samples from both the classes

    Scene Text Detection Using Attention with Depthwise Separable Convolutions

    No full text
    In spite of significant research efforts, the existing scene text detection methods fall short of the challenges and requirements posed in real-life applications. In natural scenes, text segments exhibit a wide range of shape complexities, scale, and font property variations, and they appear mostly incidental. Furthermore, the computational requirement of the detector is an important factor for real-time operation. To address the aforementioned issues, the paper presents a novel scene text detector using a deep convolutional network which efficiently detects arbitrary oriented and complex-shaped text segments from natural scenes and predicts quadrilateral bounding boxes around text segments. The proposed network is designed in a U-shape architecture with the careful incorporation of skip connections to capture complex text attributes at multiple scales. For addressing the computational requirement of the input processing, the proposed scene text detector uses the MobileNet model as the backbone that is designed on depthwise separable convolutions. The network design is integrated with text attention blocks to enhance the learning ability of our detector, where the attention blocks are based on efficient channel attention. The network is trained in a multi-objective formulation supported by a novel text-aware non-maximal procedure to generate final text bounding box predictions. On extensive evaluations on ICDAR2013, ICDAR2015, MSRA-TD500, and COCOText datasets, the paper reports detection F-scores of 0.910, 0.879, 0.830, and 0.617, respectively

    Attention Guided Feature Encoding for Scene Text Recognition

    No full text
    The real-life scene images exhibit a range of variations in text appearances, including complex shapes, variations in sizes, and fancy font properties. Consequently, text recognition from scene images remains a challenging problem in computer vision research. We present a scene text recognition methodology by designing a novel feature-enhanced convolutional recurrent neural network architecture. Our work addresses scene text recognition as well as sequence-to-sequence modeling, where a novel deep encoder–decoder network is proposed. The encoder in the proposed network is designed around a hierarchy of convolutional blocks enabled with spatial attention blocks, followed by bidirectional long short-term memory layers. In contrast to existing methods for scene text recognition, which incorporate temporal attention on the decoder side of the entire architecture, our convolutional architecture incorporates novel spatial attention design to guide feature extraction onto textual details in scene text images. The experiments and analysis demonstrate that our approach learns robust text-specific feature sequences for input images, as the convolution architecture designed for feature extraction is tuned to capture a broader spatial text context. With extensive experiments on ICDAR2013, ICDAR2015, IIIT5K and SVT datasets, the paper demonstrates an improvement over many important state-of-the-art methods

    Scene Text Detection Using Attention with Depthwise Separable Convolutions

    No full text
    In spite of significant research efforts, the existing scene text detection methods fall short of the challenges and requirements posed in real-life applications. In natural scenes, text segments exhibit a wide range of shape complexities, scale, and font property variations, and they appear mostly incidental. Furthermore, the computational requirement of the detector is an important factor for real-time operation. To address the aforementioned issues, the paper presents a novel scene text detector using a deep convolutional network which efficiently detects arbitrary oriented and complex-shaped text segments from natural scenes and predicts quadrilateral bounding boxes around text segments. The proposed network is designed in a U-shape architecture with the careful incorporation of skip connections to capture complex text attributes at multiple scales. For addressing the computational requirement of the input processing, the proposed scene text detector uses the MobileNet model as the backbone that is designed on depthwise separable convolutions. The network design is integrated with text attention blocks to enhance the learning ability of our detector, where the attention blocks are based on efficient channel attention. The network is trained in a multi-objective formulation supported by a novel text-aware non-maximal procedure to generate final text bounding box predictions. On extensive evaluations on ICDAR2013, ICDAR2015, MSRA-TD500, and COCOText datasets, the paper reports detection F-scores of 0.910, 0.879, 0.830, and 0.617, respectively

    Learning Feature Fusion in Deep Learning-Based Object Detector

    No full text
    Object detection in real images is a challenging problem in computer vision. Despite several advancements in detection and recognition techniques, robust and accurate localization of interesting objects in images from real-life scenarios remains unsolved because of the difficulties posed by intraclass and interclass variations, occlusion, lightning, and scale changes at different levels. In this work, we present an object detection framework by learning-based fusion of handcrafted features with deep features. Deep features characterize different regions of interest in a testing image with a rich set of statistical features. Our hypothesis is to reinforce these features with handcrafted features by learning the optimal fusion during network training. Our detection framework is based on the recent version of YOLO object detection architecture. Experimental evaluation on PASCAL-VOC and MS-COCO datasets achieved the detection rate increase of 11.4% and 1.9% on the mAP scale in comparison with the YOLO version-3 detector (Redmon and Farhadi 2018). An important step in the proposed learning-based feature fusion strategy is to correctly identify the layer feeding in new features. The present work shows a qualitative approach to identify the best layer for fusion and design steps for feeding in the additional feature sets in convolutional network-based detectors

    A hierarchical frame-by-frame association method based on graph matching for multi-object tracking

    No full text
    Multiple object tracking is a challenging problem because of issues like background clutter, camera motion, partial or full occlusions, change in object pose and appearance etc. Most of the existing algorithms use local and/or global association based optimization between the detections and trackers to find correct object IDs. We propose a hierarchical frame-by-frame association method that exploits a spatial layout consistency and inter-object relationship to resolve object identities across frames. The spatial layout consistency based association is used as the first hierarchical step to identify easy targets. This is done by finding a MRF-MAP solution for a probabilistic graphical model using a minimum spanning tree over the object locations and finding an exact inference in polynomial time using belief propagation. For difficult targets, which can not be resolved in the first step, a relative motion model is used to predict the state of occlusion for each target. This along with the information about immediate neighbors of the target in the group is used to resolve the identities of the objects which are occluded either by other objects or by the background. The unassociated difficult targets are finally resolved according to the state of the object along with template matching based on SURF correspondences. Experimentations on benchmark datasets have shown the superiority of our proposal compared to a greedy approach and is found to be competitive compared to state-of-the-art methods. The proposed concept of association is generic in nature and can be easily employed with other multi-object tracking algorithms.</p
    corecore